A large-scale study of the World Wide Web: network correlation functions with scale-invariant boundaries

نویسندگان

  • Guillermo A. Ludueña
  • H. Meixner
  • Gregor Kaczor
  • Claudius Gros
چکیده

We performed a large-scale crawl of the World Wide Web, covering 6.9 Million domains, including all high-traffic sites of the Internet. We present a study of the correlations found between quantities measuring the structural relevance of each node in the network (the inand out-degree, the local clustering coefficient, the first-neighbor in-degree and the Alexa rank). We find that some of these properties show strong correlation effects and that the dependencies occurring out of these correlations follow power laws not only for the averages, but also for the boundaries of the respective density distributions. In addition, these scale-free limits do not follow the same exponents as the corresponding averages. In our study we retain the directionality of the hyperlinks and develop a statistical estimate for the clustering coefficient of directed graphs. We include in our study the correlations between the in-degree and the Alexa traffic rank, a popular index for the traffic volume, finding non-trivial power-law correlations. We find that sites with more/less than about 10 links from different domains have remarkably different statistical properties, for all correlation functions studied, indicating towards an underlying hierarchical structure of the World Wide Web.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scale Invariance and Hierarchy in National Road Networks

Networks Edge cost not a function of spatial dimensions, e.g., • World Wide Web • Email • Interaction networks • proteins, genes, cell-signals, etc. • species (predator-prey, host-parasite) • Social networks (sort of) Spatial Networks Edge cost is function of spatial separation, e.g., • Communication • wired and wireless transmission • Transportation • distributing resources or goods (pipes) • ...

متن کامل

Semantic Constraint and QoS-Aware Large-Scale Web Service Composition

Service-oriented architecture facilitates the running time of interactions by using business integration on the networks. Currently, web services are considered as the best option to provide Internet services. Due to an increasing number of Web users and the complexity of users’ queries, simple and atomic services are not able to meet the needs of users; and to provide complex services, it requ...

متن کامل

A topology visualisation tool for large-scale communications networks

Introduction It is relevant to study the topology of information and communications networks, such as the Internet, the World Wide Web and peerto-peer (P2P) overlay networks, because structure fundamentally affects functions. These networks contain thousands or even millions of connections and their topologies are usually characterised by statistics [1, 2]. Here we introduce a simple tool to vi...

متن کامل

Automatic Discovery of Technology Networks for Industrial-Scale R&D IT Projects via Data Mining

Industrial-Scale R&D IT Projects depend on many sub-technologies which need to be understood and have their risks analysed before the project can begin for their success. When planning such an industrial-scale project, the list of technologies and the associations of these technologies with each other is often complex and form a network. Discovery of this network of technologies is time consumi...

متن کامل

Independence of an Equivariant and Invariant‎ Functions in Generalized Normal Family‎

In this paper we explain a necessary and sufficent condition for independence between any arbitrary statistics with sufficient statistics which is also maximum likelihood estimator in a general‎ ‎exponential family with location and scale parameter namely generalized normal distribution‎. ‎At the end‎, ‎it is shown that the converse is true except in the asymptotic cases‎.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1212.0749  شماره 

صفحات  -

تاریخ انتشار 2012